Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 39
Filter
1.
PLoS Comput Biol ; 19(3): e1010897, 2023 03.
Article in English | MEDLINE | ID: covidwho-2253748

ABSTRACT

The coalescent is a powerful statistical framework that allows us to infer past population dynamics leveraging the ancestral relationships reconstructed from sampled molecular sequence data. In many biomedical applications, such as in the study of infectious diseases, cell development, and tumorgenesis, several distinct populations share evolutionary history and therefore become dependent. The inference of such dependence is a highly important, yet a challenging problem. With advances in sequencing technologies, we are well positioned to exploit the wealth of high-resolution biological data for tackling this problem. Here, we present adaPop, a probabilistic model to estimate past population dynamics of dependent populations and to quantify their degree of dependence. An essential feature of our approach is the ability to track the time-varying association between the populations while making minimal assumptions on their functional shapes via Markov random field priors. We provide nonparametric estimators, extensions of our base model that integrate multiple data sources, and fast scalable inference algorithms. We test our method using simulated data under various dependent population histories and demonstrate the utility of our model in shedding light on evolutionary histories of different variants of SARS-CoV-2.


Subject(s)
COVID-19 , Humans , Bayes Theorem , COVID-19/epidemiology , SARS-CoV-2/genetics , Population Dynamics , Models, Statistical , Algorithms , Models, Genetic , Genetics, Population
2.
3.
Genet Epidemiol ; 47(3): 215-230, 2023 04.
Article in English | MEDLINE | ID: covidwho-2208982

ABSTRACT

Analysis of host genetic components provides insights into the susceptibility and response to viral infection such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes coronavirus disease 2019 (COVID-19). To reveal genetic determinants of susceptibility to COVID-19 related mortality, we train a deep learning model to identify groups of genetic variants and their interactions that contribute to the COVID-19 related mortality risk using the UK Biobank data (28,097 affected cases and 1656 deaths). We refer to such groups of variants as super variants. We identify 15 super variants with various levels of significance as susceptibility loci for COVID-19 mortality. Specifically, we identify a super variant (odds ratio [OR] = 1.594, p = 5.47 × 10-9 ) on Chromosome 7 that consists of the minor allele of rs76398985, rs6943608, rs2052130, 7:150989011_CT_C, rs118033050, and rs12540488. We also discover a super variant (OR = 1.353, p = 2.87 × 10-8 ) on Chromosome 5 that contains rs12517344, rs72733036, rs190052994, rs34723029, rs72734818, 5:9305797_GTA_G, and rs180899355.


Subject(s)
COVID-19 , Deep Learning , Humans , SARS-CoV-2 , Biological Specimen Banks , Models, Genetic , United Kingdom
4.
BMC Bioinformatics ; 23(1): 173, 2022 May 11.
Article in English | MEDLINE | ID: covidwho-1846791

ABSTRACT

BACKGROUND: Boolean networks (BNs) provide an effective modelling formalism for various complex biochemical phenomena. Their long term behaviour is represented by attractors-subsets of the state space towards which the BN eventually converges. These are then typically linked to different biological phenotypes. Depending on various logical parameters, the structure and quality of attractors can undergo a significant change, known as a bifurcation. We present a methodology for analysing bifurcations in asynchronous parametrised Boolean networks. RESULTS: In this paper, we propose a computational framework employing advanced symbolic graph algorithms that enable the analysis of large networks with hundreds of Boolean variables. To visualise the results of this analysis, we developed a novel interactive presentation technique based on decision trees, allowing us to quickly uncover parameters crucial to the changes in the attractor landscape. As a whole, the methodology is implemented in our tool AEON. We evaluate the method's applicability on a complex human cell signalling network describing the activity of type-1 interferons and related molecules interacting with SARS-COV-2 virion. In particular, the analysis focuses on explaining the potential suppressive role of the recently proposed drug molecule GRL0617 on replication of the virus. CONCLUSIONS: The proposed method creates a working analogy to the concept of bifurcation analysis widely used in kinetic modelling to reveal the impact of parameters on the system's stability. The important feature of our tool is its unique capability to work fast with large-scale networks with a relatively large extent of unknown information. The results obtained in the case study are in agreement with the recent biological findings.


Subject(s)
COVID-19 , Gene Regulatory Networks , Algorithms , Aniline Compounds , Benzamides , Humans , Models, Genetic , Naphthalenes , SARS-CoV-2
5.
Syst Biol ; 71(6): 1549-1560, 2022 10 12.
Article in English | MEDLINE | ID: covidwho-1713733

ABSTRACT

We present a two-headed approach called Bayesian Integrated Coalescent Epoch PlotS (BICEPS) for efficient inference of coalescent epoch models. Firstly, we integrate out population size parameters, and secondly, we introduce a set of more powerful Markov chain Monte Carlo (MCMC) proposals for flexing and stretching trees. Even though population sizes are integrated out and not explicitly sampled through MCMC, we are still able to generate samples from the population size posteriors. This allows demographic reconstruction through time and estimating the timing and magnitude of population bottlenecks and full population histories. Altogether, BICEPS can be considered a more muscular version of the popular Bayesian skyline model. We demonstrate its power and correctness by a well-calibrated simulation study. Furthermore, we demonstrate with an application to SARS-CoV-2 genomic data that some analyses that have trouble converging with the traditional Bayesian skyline prior and standard MCMC proposals can do well with the BICEPS approach. BICEPS is available as open-source package for BEAST 2 under GPL license and has a user-friendly graphical user interface.[Bayesian phylogenetics; BEAST 2; BICEPS; coalescent model.].


Subject(s)
COVID-19 , Software , Algorithms , Bayes Theorem , Humans , Markov Chains , Models, Genetic , Monte Carlo Method , Phylogeny , SARS-CoV-2
6.
Genet Epidemiol ; 46(3-4): 159-169, 2022 04.
Article in English | MEDLINE | ID: covidwho-1699896

ABSTRACT

Mendelian randomization (MR) is a statistical method exploiting genetic variants as instrumental variables to estimate the causal effect of modifiable risk factors on an outcome of interest. Despite wide uses of various popular two-sample MR methods based on genome-wide association study summary level data, however, those methods could suffer from potential power loss or/and biased inference when the chosen genetic variants are in linkage disequilibrium (LD), and also have relatively large direct effects on the outcome whose distribution might be heavy-tailed which is commonly referred to as the idiosyncratic pleiotropy phenomenon. To resolve those two issues, we propose a novel Robust Bayesian Mendelian Randomization (RBMR) model that uses the more robust multivariate generalized t$t$ -distribution to model such direct effects in a probabilistic model framework which can also incorporate the LD structure explicitly. The generalized t$t$ -distribution can be represented as a Gaussian scaled mixture so that our model parameters can be estimated by the expectation maximization (EM)-type algorithms. We compute the standard errors by calibrating the evidence lower bound using the likelihood ratio test. Through extensive simulation studies, we show that our RBMR has robust performance compared with other competing methods. We further apply our RBMR method to two benchmark data sets and find that RBMR has smaller bias and standard errors. Using our proposed RBMR method, we find that coronary artery disease is associated with increased risk of critically ill coronavirus disease 2019. We also develop a user-friendly R package RBMR (https://github.com/AnqiWang2021/RBMR) for public use.


Subject(s)
COVID-19 , Mendelian Randomization Analysis , Bayes Theorem , COVID-19/genetics , Genetic Pleiotropy , Genome-Wide Association Study , Humans , Linkage Disequilibrium , Mendelian Randomization Analysis/methods , Models, Genetic
7.
Int J Mol Sci ; 23(5)2022 Feb 22.
Article in English | MEDLINE | ID: covidwho-1699203

ABSTRACT

Since December 2019, a pandemic of COVID-19 disease, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has rapidly spread across the globe. At present, the Food and Drug Administration (FDA) has issued emergency approval for the use of some antiviral drugs. However, these drugs still have limitations in the specific treatment of COVID-19, and as such, new treatment strategies urgently need to be developed. RNA-interference-based gene therapy provides a tractable target for antiviral treatment. Ensuring cell-specific targeted delivery is important to the success of gene therapy. The use of nanoparticles (NPs) as carriers for the delivery of small interfering RNA (siRNAs) to specific tissues or organs of the human body could play a crucial role in the specific therapy of severe respiratory infections, such as COVID-19. In this review, we describe a variety of novel nanocarriers, such as lipid NPs, star polymer NPs, and glycogen NPs, and summarize the pre-clinical/clinical progress of these nanoparticle platforms in siRNA delivery. We also discuss the application of various NP-capsulated siRNA as therapeutics for SARS-CoV-2 infection, the challenges with targeting these therapeutics to local delivery in the lung, and various inhalation devices used for therapeutic administration. We also discuss currently available animal models that are used for preclinical assessment of RNA-interference-based gene therapy. Advances in this field have the potential for antiviral treatments of COVID-19 disease and could be adapted to treat a range of respiratory diseases.


Subject(s)
COVID-19/therapy , Drug Delivery Systems/methods , Nanoparticles/administration & dosage , RNA, Small Interfering/administration & dosage , RNAi Therapeutics/methods , Animals , COVID-19/epidemiology , COVID-19/virology , Humans , Models, Genetic , Nanoparticles/chemistry , Pandemics/prevention & control , RNA, Small Interfering/chemistry , RNA, Small Interfering/genetics , SARS-CoV-2/physiology
8.
J Cell Mol Med ; 26(5): 1445-1455, 2022 03.
Article in English | MEDLINE | ID: covidwho-1642687

ABSTRACT

There is an unmet need of models for early prediction of morbidity and mortality of Coronavirus disease-19 (COVID-19). We aimed to a) identify complement-related genetic variants associated with the clinical outcomes of ICU hospitalization and death, b) develop an artificial neural network (ANN) predicting these outcomes and c) validate whether complement-related variants are associated with an impaired complement phenotype. We prospectively recruited consecutive adult patients of Caucasian origin, hospitalized due to COVID-19. Through targeted next-generation sequencing, we identified variants in complement factor H/CFH, CFB, CFH-related, CFD, CD55, C3, C5, CFI, CD46, thrombomodulin/THBD, and A Disintegrin and Metalloproteinase with Thrombospondin motifs (ADAMTS13). Among 381 variants in 133 patients, we identified 5 critical variants associated with severe COVID-19: rs2547438 (C3), rs2250656 (C3), rs1042580 (THBD), rs800292 (CFH) and rs414628 (CFHR1). Using age, gender and presence or absence of each variant, we developed an ANN predicting morbidity and mortality in 89.47% of the examined population. Furthermore, THBD and C3a levels were significantly increased in severe COVID-19 patients and those harbouring relevant variants. Thus, we reveal for the first time an ANN accurately predicting ICU hospitalization and death in COVID-19 patients, based on genetic variants in complement genes, age and gender. Importantly, we confirm that genetic dysregulation is associated with impaired complement phenotype.


Subject(s)
COVID-19/genetics , COVID-19/mortality , Neural Networks, Computer , COVID-19/epidemiology , Complement Activation/genetics , Complement Factor H/genetics , Complement System Proteins/genetics , Female , Greece/epidemiology , Hospitalization/statistics & numerical data , Humans , Intensive Care Units/statistics & numerical data , Male , Middle Aged , Models, Genetic , Morbidity , Polymorphism, Single Nucleotide , Thrombomodulin/genetics
9.
PLoS Comput Biol ; 18(1): e1009804, 2022 01.
Article in English | MEDLINE | ID: covidwho-1637205

ABSTRACT

Nonstructural protein 1 (nsp1) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a 180-residue protein that blocks translation of host mRNAs in SARS-CoV-2-infected cells. Although it is known that SARS-CoV-2's own RNA evades nsp1's host translation shutoff, the molecular mechanism underlying the evasion was poorly understood. We performed an extended ensemble molecular dynamics simulation to investigate the mechanism of the viral RNA evasion. Simulation results suggested that the stem loop structure of the SARS-CoV-2 RNA 5'-untranslated region (SL1) binds to both nsp1's N-terminal globular region and intrinsically disordered region. The consistency of the results was assessed by modeling nsp1-40S ribosome structure based on reported nsp1 experiments, including the X-ray crystallographic structure analysis, the cryo-EM electron density map, and cross-linking experiments. The SL1 binding region predicted from the simulation was open to the solvent, yet the ribosome could interact with SL1. Cluster analysis of the binding mode and detailed analysis of the binding poses suggest residues Arg124, Lys47, Arg43, and Asn126 may be involved in the SL1 recognition mechanism, consistent with the existing mutational analysis.


Subject(s)
COVID-19/virology , Host-Pathogen Interactions/genetics , SARS-CoV-2 , Untranslated Regions/genetics , Viral Nonstructural Proteins , Computational Biology , Humans , Models, Genetic , Molecular Dynamics Simulation , Protein Binding , Protein Biosynthesis , SARS-CoV-2/genetics , SARS-CoV-2/pathogenicity , Viral Nonstructural Proteins/chemistry , Viral Nonstructural Proteins/genetics , Viral Nonstructural Proteins/metabolism
10.
Front Immunol ; 12: 789317, 2021.
Article in English | MEDLINE | ID: covidwho-1593957

ABSTRACT

Background: The recent emergence of COVID-19, rapid worldwide spread, and incomplete knowledge of molecular mechanisms underlying SARS-CoV-2 infection have limited development of therapeutic strategies. Our objective was to systematically investigate molecular regulatory mechanisms of COVID-19, using a combination of high throughput RNA-sequencing-based transcriptomics and systems biology approaches. Methods: RNA-Seq data from peripheral blood mononuclear cells (PBMCs) of healthy persons, mild and severe 17 COVID-19 patients were analyzed to generate a gene expression matrix. Weighted gene co-expression network analysis (WGCNA) was used to identify co-expression modules in healthy samples as a reference set. For differential co-expression network analysis, module preservation and module-trait relationships approaches were used to identify key modules. Then, protein-protein interaction (PPI) networks, based on co-expressed hub genes, were constructed to identify hub genes/TFs with the highest information transfer (hub-high traffic genes) within candidate modules. Results: Based on differential co-expression network analysis, connectivity patterns and network density, 72% (15 of 21) of modules identified in healthy samples were altered by SARS-CoV-2 infection. Therefore, SARS-CoV-2 caused systemic perturbations in host biological gene networks. In functional enrichment analysis, among 15 non-preserved modules and two significant highly-correlated modules (identified by MTRs), 9 modules were directly related to the host immune response and COVID-19 immunopathogenesis. Intriguingly, systemic investigation of SARS-CoV-2 infection identified signaling pathways and key genes/proteins associated with COVID-19's main hallmarks, e.g., cytokine storm, respiratory distress syndrome (ARDS), acute lung injury (ALI), lymphopenia, coagulation disorders, thrombosis, and pregnancy complications, as well as comorbidities associated with COVID-19, e.g., asthma, diabetic complications, cardiovascular diseases (CVDs), liver disorders and acute kidney injury (AKI). Topological analysis with betweenness centrality (BC) identified 290 hub-high traffic genes, central in both co-expression and PPI networks. We also identified several transcriptional regulatory factors, including NFKB1, HIF1A, AHR, and TP53, with important immunoregulatory roles in SARS-CoV-2 infection. Moreover, several hub-high traffic genes, including IL6, IL1B, IL10, TNF, SOCS1, SOCS3, ICAM1, PTEN, RHOA, GDI2, SUMO1, CASP1, IRAK3, HSPA5, ADRB2, PRF1, GZMB, OASL, CCL5, HSP90AA1, HSPD1, IFNG, MAPK1, RAB5A, and TNFRSF1A had the highest rates of information transfer in 9 candidate modules and central roles in COVID-19 immunopathogenesis. Conclusion: This study provides comprehensive information on molecular mechanisms of SARS-CoV-2-host interactions and identifies several hub-high traffic genes as promising therapeutic targets for the COVID-19 pandemic.


Subject(s)
COVID-19/genetics , Gene Expression Profiling/methods , Signal Transduction/genetics , Transcription Factors/genetics , Transcriptome/genetics , COVID-19/epidemiology , COVID-19/virology , Cluster Analysis , Gene Ontology , Gene Regulatory Networks , Humans , Immunity/genetics , Models, Genetic , Pandemics , Protein Interaction Maps/genetics , SARS-CoV-2/physiology
11.
Cells ; 11(1)2021 12 28.
Article in English | MEDLINE | ID: covidwho-1580991

ABSTRACT

Coronavirus disease (COVID-19) spreads mainly through close contact of infected persons, but the molecular mechanisms underlying its pathogenesis and transmission remain unknown. Here, we propose a statistical physics model to coalesce all molecular entities into a cohesive network in which the roadmap of how each entity mediates the disease can be characterized. We argue that the process of how a transmitter transforms the virus into a recipient constitutes a triad unit that propagates COVID-19 along reticulate paths. Intrinsically, person-to-person transmissibility may be mediated by how genes interact transversely across transmitter, recipient, and viral genomes. We integrate quantitative genetic theory into hypergraph theory to code the main effects of the three genomes as nodes, pairwise cross-genome epistasis as edges, and high-order cross-genome epistasis as hyperedges in a series of mobile hypergraphs. Charting a genome-wide atlas of horizontally epistatic hypergraphs can facilitate the systematic characterization of the community genetic mechanisms underlying COVID-19 spread. This atlas can typically help design effective containment and mitigation strategies and screen and triage those more susceptible persons and those asymptomatic carriers who are incubation virus transmitters.


Subject(s)
COVID-19/transmission , Gene Expression Regulation , Genome, Viral/genetics , Genomics/methods , SARS-CoV-2/genetics , Algorithms , COVID-19/epidemiology , COVID-19/virology , Epistasis, Genetic , Genome-Wide Association Study/methods , Humans , Models, Genetic , Pandemics , SARS-CoV-2/pathogenicity , Virulence/genetics
12.
Infect Genet Evol ; 96: 105106, 2021 12.
Article in English | MEDLINE | ID: covidwho-1506080

ABSTRACT

Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary analysis. Recent alignment-free methods can overcome the problems of traditional alignment-based methods, which consume both time and space. This paper proposes a novel alignment-free method called the correlation coefficient feature vector (CCFV), which defines a correlation measure of the L-step delay of a nucleotide location from its location in the original DNA sequence. The numerical feature is a 16×L-dimensional numerical vector describing the distribution characteristics of the nucleotide positions in a DNA sequence. The proposed L-step delay correlation measure is interestingly related to some types of L+1 spaced mers. Unlike traditional gene comparison, our method avoids the computational complexity of multiple sequence alignment, and hence improves the speed of sequence comparison. Our method is applied to evolutionary analysis of the common human viruses including SARS-CoV-2, Dengue virus, Hepatitis B virus, and human rhinovirus and achieves the same or even better results than alignment-based methods. Especially for SARS-CoV-2, our method also confirms that bats are potential intermediate hosts of SARS-CoV-2.


Subject(s)
Genome, Viral/genetics , Phylogeny , Sequence Analysis, DNA/methods , Coronavirus/genetics , Dengue Virus/genetics , Hepatitis B/genetics , Humans , Models, Genetic , Rhinovirus/genetics , SARS-CoV-2/genetics , Sequence Alignment
13.
Cancer Prev Res (Phila) ; 14(11): 1021-1032, 2021 11.
Article in English | MEDLINE | ID: covidwho-1463067

ABSTRACT

Up to 10% of patients with pancreatic ductal adenocarcinoma (PDAC) carry underlying germline pathogenic variants in cancer susceptibility genes. The GENetic Education Risk Assessment and TEsting (GENERATE) study aimed to evaluate novel methods of genetic education and testing in relatives of patients with PDAC. Eligible individuals had a family history of PDAC and a relative with a germline pathogenic variant in APC, ATM, BRCA1, BRCA2, CDKN2A, EPCAM, MLH1, MSH2, MSH6, PALB2, PMS2, STK11, or TP53 genes. Participants were recruited at six academic cancer centers and through social media campaigns and patient advocacy efforts. Enrollment occurred via the study website (https://GENERATEstudy.org) and all participation, including collecting a saliva sample for genetic testing, could be done from home. Participants were randomized to one of two remote methods that delivered genetic education about the risks of inherited PDAC and strategies for surveillance. The primary outcome of the study was uptake of genetic testing. From 5/8/2019 to 5/6/2020, 49 participants were randomized to each of the intervention arms. Overall, 90 of 98 (92%) of randomized participants completed genetic testing. The most frequently detected pathogenic variants included those in BRCA2 (N = 15, 17%), ATM (N = 11, 12%), and CDKN2A (N = 4, 4%). Participation in the study remained steady throughout the onset of the Coronavirus disease (COVID-19) pandemic. Preliminary data from the GENERATE study indicate success of remote alternatives to traditional cascade testing, with genetic testing rates over 90% and a high rate of identification of germline pathogenic variant carriers who would be ideal candidates for PDAC interception approaches. PREVENTION RELEVANCE: Preliminary data from the GENERATE study indicate success of remote alternatives for pancreatic cancer genetic testing and education, with genetic testing uptake rates over 90% and a high rate of identification of germline pathogenic variant carriers who would be ideal candidates for pancreatic cancer interception.


Subject(s)
BRCA1 Protein/genetics , BRCA2 Protein/genetics , Genetic Predisposition to Disease , Genetic Testing/methods , Germ-Line Mutation , Pancreatic Neoplasms/genetics , Risk Assessment/methods , Adolescent , Adult , Aged , Aged, 80 and over , Carcinoma, Pancreatic Ductal/genetics , Carcinoma, Pancreatic Ductal/pathology , Carcinoma, Pancreatic Ductal/therapy , Female , Humans , Male , Middle Aged , Models, Genetic , Pancreatic Neoplasms/pathology , Pancreatic Neoplasms/therapy , Patient Participation , Risk Factors , Surveys and Questionnaires , Telemedicine , Young Adult
14.
Infect Genet Evol ; 95: 104812, 2021 11.
Article in English | MEDLINE | ID: covidwho-1461688

ABSTRACT

While the COVID-19 pandemic continues to spread with currently more than 117 million cumulated cases and 2.6 million deaths worldwide as per March 2021, its origin is still debated. Although several hypotheses have been proposed, there is still no clear explanation about how its causative agent, SARS-CoV-2, emerged in human populations. Today, scientifically-valid facts that deserve to be debated still coexist with unverified statements blurring thus the knowledge on the origin of COVID-19. Our retrospective analysis of scientific data supports the hypothesis that SARS-CoV-2 is indeed a naturally occurring virus. However, the spillover model considered today as the main explanation to zoonotic emergence does not match the virus dynamics and somehow misguided the way researches were conducted. We conclude this review by proposing a change of paradigm and model and introduce the circulation model for explaining the various aspects of the dynamic of SARS-CoV-2 emergence in humans.


Subject(s)
COVID-19/epidemiology , Genome, Viral , Models, Statistical , Pandemics , SARS-CoV-2/genetics , Zoonoses/epidemiology , Animals , COVID-19/transmission , COVID-19/virology , Chiroptera/virology , Eutheria/virology , Humans , Models, Genetic , Retrospective Studies , SARS-CoV-2/growth & development , SARS-CoV-2/pathogenicity , Stochastic Processes , Zoonoses/transmission , Zoonoses/virology
16.
Sci Rep ; 11(1): 18108, 2021 09 13.
Article in English | MEDLINE | ID: covidwho-1406409

ABSTRACT

The progress of the SARS-CoV-2 pandemic requires the design of large-scale, cost-effective testing programs. Pooling samples provides a solution if the tests are sensitive enough. In this regard, the use of the gold standard, RT-qPCR, raises some concerns. Recently, droplet digital PCR (ddPCR) was shown to be 10-100 times more sensitive than RT-qPCR, making it more suitable for pooling. Furthermore, ddPCR quantifies the RNA content directly, a feature that, as we show, can be used to identify nonviable samples in pools. Cost-effective strategies require the definition of efficient deconvolution and re-testing procedures. In this paper we analyze the practical implementation of an efficient hierarchical pooling strategy for which we have recently derived the optimal, determining the best ways to proceed when there are impediments for the use of the absolute optimum or when multiple pools are tested simultaneously and there are restrictions on the throughput time. We also show how the ddPCR RNA quantification and the nested nature of the strategy can be combined to perform self-consistency tests for a better identification of infected individuals and nonviable samples. The studies are useful to those considering pool testing for the identification of infected individuals.


Subject(s)
COVID-19 Nucleic Acid Testing/methods , COVID-19/diagnosis , Diagnostic Tests, Routine/methods , Real-Time Polymerase Chain Reaction/methods , SARS-CoV-2/genetics , Algorithms , COVID-19/epidemiology , COVID-19/virology , Communicable Diseases/diagnosis , Communicable Diseases/virology , Humans , Models, Genetic , Pandemics , RNA, Viral/genetics , Reproducibility of Results , SARS-CoV-2/physiology , Sensitivity and Specificity , Specimen Handling/methods
17.
Mol Biol Evol ; 38(4): 1537-1543, 2021 04 13.
Article in English | MEDLINE | ID: covidwho-1387956

ABSTRACT

The rooting of the SARS-CoV-2 phylogeny is important for understanding the origin and early spread of the virus. Previously published phylogenies have used different rootings that do not always provide consistent results. We investigate several different strategies for rooting the SARS-CoV-2 tree and provide measures of statistical uncertainty for all methods. We show that methods based on the molecular clock tend to place the root in the B clade, whereas methods based on outgroup rooting tend to place the root in the A clade. The results from the two approaches are statistically incompatible, possibly as a consequence of deviations from a molecular clock or excess back-mutations. We also show that none of the methods provide strong statistical support for the placement of the root in any particular edge of the tree. These results suggest that phylogenetic evidence alone is unlikely to identify the origin of the SARS-CoV-2 virus and we caution against strong inferences regarding the early spread of the virus based solely on such evidence.


Subject(s)
COVID-19/virology , Genome, Viral , Mutation , Phylogeny , SARS-CoV-2/genetics , Algorithms , Animals , Bayes Theorem , Evolution, Molecular , Humans , Likelihood Functions , Markov Chains , Models, Genetic , Models, Statistical , Monte Carlo Method , Mutation, Missense , RNA, Viral/genetics , Uncertainty
18.
PLoS Genet ; 16(12): e1009272, 2020 12.
Article in English | MEDLINE | ID: covidwho-1388879

ABSTRACT

The Betacoronaviruses comprise multiple subgenera whose members have been implicated in human disease. As with SARS, MERS and now SARS-CoV-2, the origin and emergence of new variants are often attributed to events of recombination that alter host tropism or disease severity. In most cases, recombination has been detected by searches for excessively similar genomic regions in divergent strains; however, such analyses are complicated by the high mutation rates of RNA viruses, which can produce sequence similarities in distant strains by convergent mutations. By applying a genome-wide approach that examines the source of individual polymorphisms and that can be tested against null models in which recombination is absent and homoplasies can arise only by convergent mutations, we examine the extent and limits of recombination in Betacoronaviruses. We find that recombination accounts for nearly 40% of the polymorphisms circulating in populations and that gene exchange occurs almost exclusively among strains belonging to the same subgenus. Although experimental studies have shown that recombinational exchanges occur at random along the coronaviral genome, in nature, they are vastly overrepresented in regions controlling viral interaction with host cells.


Subject(s)
Betacoronavirus/classification , Betacoronavirus/genetics , Recombination, Genetic/genetics , Spike Glycoprotein, Coronavirus/genetics , Crossing Over, Genetic/genetics , Genes, Viral/genetics , Genome, Viral/genetics , Host Specificity/genetics , Models, Genetic , Polymorphism, Genetic , SARS-CoV-2/classification , SARS-CoV-2/genetics , Viral Tropism/genetics
19.
Genome Biol Evol ; 13(10)2021 10 01.
Article in English | MEDLINE | ID: covidwho-1370777

ABSTRACT

Owing to a lag between a deleterious mutation's appearance and its selective removal, gold-standard methods for mutation rate estimation assume no meaningful loss of mutations between parents and offspring. Indeed, from analysis of closely related lineages, in SARS-CoV-2, the Ka/Ks ratio was previously estimated as 1.008, suggesting no within-host selection. By contrast, we find a higher number of observed SNPs at 4-fold degenerate sites than elsewhere and, allowing for the virus's complex mutational and compositional biases, estimate that the mutation rate is at least 49-67% higher than would be estimated based on the rate of appearance of variants in sampled genomes. Given the high Ka/Ks one might assume that the majority of such intrahost selection is the purging of nonsense mutations. However, we estimate that selection against nonsense mutations accounts for only ∼10% of all the "missing" mutations. Instead, classical protein-level selective filters (against chemically disparate amino acids and those predicted to disrupt protein functionality) account for many missing mutations. It is less obvious why for an intracellular parasite, amino acid cost parameters, notably amino acid decay rate, is also significant. Perhaps most surprisingly, we also find evidence for real-time selection against synonymous mutations that move codon usage away from that of humans. We conclude that there is common intrahost selection on SARS-CoV-2 that acts on nonsense, missense, and possibly synonymous mutations. This has implications for methods of mutation rate estimation, for determining times to common ancestry and the potential for intrahost evolution including vaccine escape.


Subject(s)
COVID-19/virology , Mutation , SARS-CoV-2/genetics , Codon Usage , Codon, Nonsense , Evolution, Molecular , Humans , Models, Genetic , Mutation Rate , Mutation, Missense , Polymorphism, Single Nucleotide , Selection, Genetic , Silent Mutation
20.
Syst Biol ; 71(2): 426-438, 2022 02 10.
Article in English | MEDLINE | ID: covidwho-1358488

ABSTRACT

Phylogenetic trees from real-world data often include short edges with very few substitutions per site, which can lead to partially resolved trees and poor accuracy. Theory indicates that the number of sites needed to accurately reconstruct a fully resolved tree grows at a rate proportional to the inverse square of the length of the shortest edge. However, when inferred trees are partially resolved due to short edges, "accuracy" should be defined as the rate of discovering false splits (clades on a rooted tree) relative to the actual number found. Thus, accuracy can be high even if short edges are common. Specifically, in a "near-perfect" parameter space in which trees are large, the tree length $\xi$ (the sum of all edge lengths) is small, and rate variation is minimal, the expected false positive rate is less than $\xi/3$; the exact value depends on tree shape and sequence length. This expected false positive rate is far below the false negative rate for small $\xi$ and often well below 5% even when some assumptions are relaxed. We show this result analytically for maximum parsimony and explore its extension to maximum likelihood using theory and simulations. For hypothesis testing, we show that measures of split "support" that rely on bootstrap resampling consistently imply weaker support than that implied by the false positive rates in near-perfect trees. The near-perfect parameter space closely fits several empirical studies of human virus diversification during outbreaks and epidemics, including Ebolavirus, Zika virus, and SARS-CoV-2, reflecting low substitution rates relative to high transmission/sampling rates in these viruses.[Ebolavirus; epidemic; HIV; homoplasy; mumps virus; perfect phylogeny; SARS-CoV-2; virus; West Nile virus; Yule-Harding model; Zika virus.].


Subject(s)
COVID-19 , Zika Virus Infection , Zika Virus , Humans , Models, Genetic , Phylogeny , SARS-CoV-2
SELECTION OF CITATIONS
SEARCH DETAIL